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© Digital signal processing apparatus. 

® A digital signal processing apparatus which is used for the computation of coding image signals or the like 
and a motion compensative operation method which uses a digital signal processing apparatus. The apparatus 
comprises a plurality of signal processing means arranged in parallel and control means which assigns loads to 
the signal processing means so that the signal processing means have even computation volumes. Alternatively, 
2 an address generator is provided for each of data sets entered independently. An intermediate check is 
conducted during the computation for a block which involves a motion compensative operation. 
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DIGITAL SIGNAL PROCESSING APPARATUS 



BACKGROUND OF THE INVENTION 



5 Field of the invention 

The present invention relates to a digital signal processing apparatus which performs computational 
processes for digital signals. 



10 



Description of the Prior Art 



Fig 1 shows the multiprocessor system described in article entitled "A Real Time Video Signal 
Processor Suitable for Motion Picture Coding Applications". IEEE. GLOBCOM '87. p. 453. In Fig. 1, input 
75 data 1 is received by a data transfer controller 3. and thereafter data 4 are transferred selectively to digital 
signal processors 2. i.e. DSP-1 through DSP-N. in block-1. After being processed by the respective DSPS 
Tn block-1. resultant data 5 is transferred to block-2 and processed by respective DSPs for the next 

processing step. . ^ . 

Fig. 2(a) shows divided memory areas of the DSPs. For the simplicity of explanation, shown here is an 
20 example of parallel processing using three DSPs 2. to which process areas A, B and C are assigned evenly. 
In the inter-frame image coding system and the like, it is a general convention to employ the conditional 
pixel supplementary process in which only portions having at least a certain difference between the input 
frame and previous frame are coded and previous frame data is used for the remaining portions. 
Accordingly, the volume of computation needed for the process differs depending on the valid pixel rate 
25 even though the number of pixels in the process area is constant. The volume of computation or 
computation time needed is proportional to the valid pixel rate. 

In the inter-frame image coding system or the like, assuming that the number of valid pixels is shared 
by all DSPs to have a distribution EA. EB and EC as shown In Fig. 2(b), the computation time needed for 
one block of parallel DSP configuration is determined from the process time of the DSP which works for the 
30 area B with the largest volume of process M. and the remaining DSPs which have finished the areas A and 
C earlier have idle time. 

The conventional digital signal processing apparatus an-anged as described above has Its overall 
process time determined from the longest process time among DSPs when the density of information, such 
as the vaiid pixel rate, within a frame is uneven and the distribution of information varies with time, resulting 

35 in a degraded process efficiency per DSP unit. 

Fig. 3 is a diagram showing, as an example, the arrangement of other digital signal processing 
apparatus disclosed in an article entitled "Realtime Video Signal Processor Module", in the proceeding of 
ICASSP '87. pp. 1961 - 1964. April 1987. Dallas. U.S.A. In the figure, indicated by 1 is an input terminal. 4 
is an input bus for distributing input data on the input terminal 1, 28a is a feedback bus for distributing the 

40 result of previous process, and 20 are signal processing modules each including an input storage 21. a 
processing unit 22, an output storage 23 and a timing control unit 24. Indicated by 25 are wired-OR circuits 
through which feedback data on output ports 30 are placed on the feedback bus 28a. 26 are wired-OR 
circuits through which output data on output ports 29 are delivered to the output terminal 5 over the output 
bus 5a. 27 are input ports for the input data to the signal processing module 20. and 28 are input ports for 

45 the feedback data to the signal processing module 20. 

Fig. 4 is a block diagram showing in more detail one of the signal processing module in Fig. 3. In the 
figure, indicated by 221 is an address generator (AGU A). 211 is an input dual memory (MEM A) which 
receives data on the input port 27 over the input bus 4. 212 is an input dual memory (MEM B) which 
receives data on the feedback bus 28a by way of the input port 28. 222 is an address generator (AGU B). 

50 223 is an X-bus, 224 is a Y-bus. and 225 is a pipeline arithmetic unit (PAU) having its input terminal EX1 
connected to the X-bus 223 and another input terminal EX2 connected to the Y-bus 224. Indicated by 226 
is a data memory [MEM P(Q)] having its output connected to the X-bus 223. 227 is an address generator 
[AGU P(Q)] having its output connected to the Y-bus 224 and data memory 226. 228 is a mode register 
(MDR) having its output connected to the X-bus 223 and Y-bus 224. and 241 is a 2-bus connected to the 
inputs of the addr ss generators 221, 222 and 227. pipeline arithmetic unit 225 and data memory 226. 
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indicated by 242 is a sequencer (SEQ). 243 is an instruction memory (IRAM) connected to the output of the 
seauencer 242 and 245 is a decoder (DEC) connected to the output of the instruction memory 243. with 
the output of the decoder 245 being connected to the Z-bus 241 and output bus 231. The 0"tP"^bus 231 .s 
connected to the input of the mode register 228 and the Z-bus 241. Indicated by 232 .s an ^J^O^^'^ory 
5 (MEM C) connected to the output bus 231 . 233 is an FIFO (MEM D) connected to the output bus 231 . 29 is 
an output port of the FIFO memory 232. and 30 is an output port of the FIFO memory 233. 

Fiq 5 is a diagram showing, as an example, the algorism of a typical high-efficiency coder for a moving 
image In the figure, indicated by 250 is an input terminal for the input video signal. 251 Is an input frame 
buffer having at least a 1-frame capacity and having the simultaneous read-write ability. 252 is an mter- 
,0 frame subtracter for evaluating the difference. 253 is a block Identifier. 254 is a coder. 255 is a coding 
Darameter produced by the coder 254. 256 is a variable-length coder. 257 is a video multiplexer. 258 is a 
transmission buffer memory, and 259 is an output terminal for the coded data. Connected in cascade 
between the input terminal 250 and output terminal 259 are the above-mentioned functional blocks 251 - 
254 and 256 - 258. Further indicated by 260 is a local decoder which receives the coding parameter 255. 
,5 261 is an inter frame adder, 262 is an in-loop filter. 263 Is a coding frame memory, 264 is previous coded 
frame data. 265 is a motion compensator. 266 is current frame data fed from the input frame buffer 251 to 
the motion compensator 265. 267 is motion vector data. 268 is compensated previous frame data fed from 
the motion compensator 265 to the inter-frame subtracter 252 and inter-frame adder 261. 269 ts a feedback 
signal and 270 is a coding controller which provides coding control information for the video multiplexer 
20 257. a feed-forward signal to the input frame buffer 251. a block identification control signal 273 to the block 
Ideritifier 253, and a coding control signal 274 to the variable-length coder 256. 

Next the operation of the conventional digital signal processing apparatus will be descnbed in 
connection with Fig. 3. This apparatus is intended for moving image processing and is based on the 
division parallel processing system in which a frame is divided into small frames and a signal processing 
25 module 20 is assigned to each of the divided frame areas. 

Initially each signal processing module 20 operates on the autonomous basis by expending one video 
frame time' to fetch a divided frame area assigned to it among the input data transfen-ed frame-wise m 
raster scanning over the input bus 4 and store the data in the input storage 21. At the same time, if the 
process result of the previous frame is needed for the cun-ent process, it operates by expending one video 
frame time to fetch data of the assigned area of the frame .in the feedback data from the input port 28 over 
the feedback bus 28a and stores the data in the Input storage-21. ..... , 

Upon expiration of one video frame time, the processing unit 22 performs the prescribed signal 
processing for the input data and feedback data stored in the Input storage 21. and stores the result 
temporarily in the output storage 23. The feedback data led out of the output storage 23 through the output 
port 30 is timed for synchronization with other signal processing modules 20 and. by being merged into aM 
feedback data by the wired-OR circuit 25. placed- on the feedback bus 28a. Similarly, the output data led 
out of the output storage 23 through the output port 29 is timed for synchronization with other signal 
processing modules 20 and. by being merged into all output data by the wired-OR circuit 26. delivered to 
the output terminal 5 over the output bus 5a. w j i, i ^ 

40 Divided frame areas processed indivisually by the signal processing modules 20 are combined back to 
a video frame. Therefore, parallel processing of areas divided type is realized. For reason as descnbed 
above it is necessary for all signal processing modules 20 to have their process commencement in 
complete synchronism with one another. On this account, the timing control unit 24 provides all sections of 
system with the timing of data input/output and process commencement in synchronism with the video 
45 frame timing which is the synchronization reference point. 

Next the operation of one signal processing module 20 will be briefed In connection with Rg. 4. Among 
a video frame entered frame-wise through the input port 27 in synchronism with the video frame sync 
signal data of the assigned area is stored in the input dual memory 211. At the same time, among the 
coded previous frame data entered through the input port 28. the portion of the assigned area and its 
so peripheral data are stored in the input dual memory 212. 

The input dual memories 21 1 and 212 Is made up of a two-sided memory device in the same structure 
on both sides and It operates such that while one side is written data the other side is connected to theX- 
bus 223 and Y-bus 224 for reading for the coding process by the pipeline anthmetic unit 225. The 
r ad/write sides of the input dual memories 211 and 212 are switched by the above-mentioned video frame 
55 sync signal so that input data of assigned areas on the input ports 27 and 28 are entered frame-wise 

uninterruptedly. ^ 

The data read out to the X-bus 223 and Y-bus 224 are those stored at data memory addresses 
indicated to the input dual memories 211 and 212 by the address generators 221 and 222 that are 
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controlled by the signals provided by the decoder 245 by decoding a 80-bit length horizon-type microcodes 
read out in accordance with the address of the command memory 243 indicated by the sequencer 242. The 
data placed on the X-bus 223 and Y-bus 224 are entered in parallel to the pipeline arithmetic unit 225. 
/vhich implements a series of signal processing including coding and local decoding and outputs the result 
5 to the Z-bus 241. Among the process outputs placed on the Z-bus. the coded output is stored in the FIFO 
memory 232 and the local decoded output is stored in the FIFO memory 233 by way of the output bus 231. 

The FIFO memories 232 and 233 are buffer memories of FIFO configuration. Feedback data consisting 
of the output data and local decoded data are read out of the output ports 29 and 30 at the read control 
timing for the assigned area produced from the video frame signal, and a piece of video frame local 
70 decoded data and coded output data in compliance with the scanning order are produced. 

The data memory 226 which is controlled by the output of the address generator 227 Is used by a work 
rTiemory which is necessary for the process of the pipeline arithmetic unit 225 and a table which stores 
constants. The mode register 228 consists of a register file including registers for loading immediate values 
from the decoder 245. 

75 This digital signal processing apparatus is principally based on the foregoing area division parallel 

processing, and is intended such that each signal processing module 20 deals with a divided frame area 
independently on a realtime basis. When the digital signal processing apparatus is intended for the 
achievement of a coder as shown in Fig. 5, only portions excluding the variable-length coder 256, video 
multiplexer 257, transmission buffer 258 and coding controller 270 can be realized. Namely, It is not 

20 suitable for a continuous process in one video frame, and is limited to the inter-frame coding loop process 
ranging from the input frame buffer 251 to the block identifier 253, coder 254, local decoder 260. coding 
frame memory 263. and to the motion compensator 266 useful for data completely divisible within a frame. 

Since each signal processing module 20 implements the same process for each frame, the processing 
program stored in the instruction memory 243 can be a single program. When a frame is divided Into M 

25 areas (M is an integer greater than or equal to 1), the number of process cycles Nc per pixel which can be 
dealt with on a realtime basis by one signal processing module 20 is given by the following calculation, 

Nc = Mc'Tf/Mp'Np (clocks/pixel) 
where Mc is the frequency of machine cycle (Hz), Tf is the frame period (sec), Mp Is the number of 
horizontal pixels in the assigned area, and Np is the number of vertical pixels in the assigned area. 

30 On this account, if a frame is divided into four areas, for example, each having the assignment of a 

signal processing module 20, the number of process cycles Nc is Increased by four fold, and it becomes 
possible for the video signal processing, which is required to be very fast, to be dealt with on a realtime 
basis by an increased number of relatively slow signal processing modules 20. 

The conventional digital signal processing apparatus arranged as described above have the following 

35 problems for processing video signals. 

(a) For the achievement of very fast processing, a frame must be divided into numerous small areas, 
however, certain signal process algorism does not allow independent processes for areas below a certain 
minimal division size. Therefore, realtime processing can not be achieved by increasing the parallelism. 

(b) Because of a fixed distribution of load to signal processing modules, the process time must be 
40 set to meet the longest one when each signal processing module has a different process time. Therefore. 

the system has an unnecessarily increased parallelism relative to the processing capacity. 

(c) Data input and data processing each take one frame time, and data input and output each need a 
1 -frame buffer memory, resulting in a longer time lag and an increased memory capacity. Therefore, the 
system involves a significant loop delay in feedback control and the like, and it is difficult to realize the 

45 coding controller 270 in Fig. 5 for example. 

(d) Since the system is intended for a complete parallel processing, it cannot perform such a process 
as scanning the entirety of a same frame horizontally. 

Rg. 6 is a block diagram of the conventional digital signal processing system disclosed in the 
50 proceeding (No. SI 0-1) of the 1986 annual convention of the communication department of The Institute of 
Electronics and Communication Engineers of Japan. In the figure, indicated by 31 is a dual-port internal 
data memory (will be termed 2P-RAM) capable of reading and writing two sets of data simultaneously. 32 is 
an address generator which calculates the address of read data or write data, 33 is a data bus used for the 
internal transfer of data related to computation. 34 and 35 are selectors which select data in the 2P-RAM 
55 31. 36 is a register which holds computation data selected by the selector 34. 35 is a register which holds 
computation data selected by the selector 35. 38 is a multiplier, 39 is a register which holds the output of 
the multiplier 38, 40 is a selector which selects the output of the register 36 or accumulators (ACCO - 
ACC3) 44. 41 is a selector which selects the output of the registers 39 or 37, 42 is an arithmetic/logic unit 
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Which performs computations for the outputs of the selectors 40 and 41. and 43 is a selector which selects 
the output of the arithmetic/logic unit 42 or data in an external data register 46. The accumulators 44 are 
used to hold the output of the arithmetic/logic unit 42 for cumulative computations. The external data 
register 46 is to hold data from an external data memory 47. Indicated by 45 is an external address register 
5 which holds address data provided by the address generator 32 and transfers it to the external data 

memory 47. , 

Next the operation will be described. This signal processing system based on a digital signal processor 
performs command fetching and decoding for the preset microprogram, data reading, computation, and 
computation result writing, in a parallel pipeline processing mode. The following describes the operation of 
10 3-input-1 -output computation. . 

The arithmetic/logic unit, multiplier, address generator, data memories and selectors are controlled m 
the microcommand mode. . . 

Arithmetic operations for two inputs, including addition, subtraction, maximum evaluation, minimum 
evaluation, etc. are expressed generically by aeb.. and a multiplication operation for two inputs is expressed 
75 generically by a x b, where a and b are independent data. 

The arithmetic operations and multiplication are combined to form 3-input-l -output operations, and they 
are defined by the following expressions, 
z; = (ai ® bi) X ci (l ) 
2f = (ai X bi) © ci (2) 

where j = 1 to N, and ai, bi and ci are sets of independent data stored in the 2P-RAM 31. 

Fig. 7 shows the sequence of process for implementing the 3-input operation of the form of expression 
(1) by the digital signal processing system, for example, shown in Fig. 6. 

The data address generator 32 sets up the starting addresses for two data sets A and B. and selects 
the simple incremental mode. Then the two data sets A and B are loaded through the selectors 34 and 35 
into the registers 36 and 37. The selectors 40 and 41 select the registers 36 and 37. respectively, so that 
the arithmetic/logic unit 42 implements the arithmetic operation aiebi. The selector 43 selects the 
arithmetic/logic unit 42 to hold the operation result temporarily in one of accumulators (ACCO - ACC3) 44. 
and the resultant data is sent over the data bus 33 and through the external register 46 and stored in the 
external memory 47. which addressing mode is the simple incremental mode because of it being linked to 
one of addresses for the 2P-RAM 31 in the address generator 32. 

In the subsequent step ST3. the data address generator 32 sets up the starting addresses of the data 
set C and data set ai©bi. and ci data is read out of the 2P-RAM 31 to the register 36. The selector 35 
selects the data bus to load the data of aiebi in the external memory 47 into the register 37. In this case, in 
order to have a coincident timing of reading for the data set C and data set aiebi. step ST4 needs to 
35 expend two cycles of useless command reading for the external memory in advance. 

The two sets of data are rendered multiplication by the multiplier 38 in step ST5. and the result is 
stored in the register 39. In the next cycle, the resultant data is passed through the arithmetic/logic unit 42 
and. after being held temporarily in one of the accumulators (ACCO - ACC3) 44. transferred over the data 
bus 33 to the 2P-RAM 31 . 

These operations are carried out in parallel on the basis of the pipeline process, and the operations 
from the reading of 2P-RAM 31 until the storing of the process result in the external memory 47 for N 
pieces of data sets will take N + 3 machine cycles in the case of an arithmetic operation. 

The steps of operations are listed in the following Table 1 and Table 2. Table 1 is for the operation of 
aiebi and the transfer of the result to the external memory 47. and Table 2 is for the reading the resultant 
aiebi from the external memory 47, the operation of (aiebi) 'ci. and the transfer of the result to the 2P-RAM. 
In both tables, symbol "x" represents an Indefinite value. Storing in the external data register 46 completes 
in machine cycle N + 3 in both tables, and the external data register 46 is read uselessly in machine cycle 
0 (two machine cycles) in Table 2. 
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Next after two useless reading cycles of the external rrtemory 47 for timing purposes, multiplication ii 
carried out for N pieces of data sets and the results are stored in the 2P-RAM 31. These operations take f 
+ 3 machine cycles, which are added by two command cycles for address initialization, and a total of 2t 
+ 10 cycles are expended. An operation of expression (2) also takes 2N + 10 cycles. Accordingly, it wi 
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be appreciated that if a 3-input-l -output operation is conducted for N pieces data sets using a processor 
with the ability of 2-input operation at nnost, it will take about 2N machine cycles (provided that N is • 
sufficiently large). 

The following describes the cumulative operation for the results of the foregoing 3-input-1 -output 
computation. 

N 

S = Z (ai (J) bi) x ci ... (3) 

i=l 

N 

S = I (ai X bi) ® ci ... (4) 

i=l 

In the case of expression (3), the multiplication result for aiebi and ci (output of register 39) and the 
intermediate cumulative value are entered to the arithmetic/logic unit 42, and the result of summation is 
entered back to the same accumulator 44 through the selector 43. Thereby, the process takes 2N + 10 
cycles unchanged. 

In the case of expression (4), the data sets (ai x bi)eci which have been stored temporarily in the 2P- 
RAM 31 are read out sequentially and summed by the arithmetic/logic unit 42, and therefore the process 
needs another N cycles, resulting in a total of 3N + 10 cycles. 

The conventional digital signal processing system is formed as described above, and therefore for a 3- 
input-1 -output operation of three independent data sets, it performs two times of 2-in put- 1 -output operation. 
In addition, the process time is further extended for address control, memory transfer and other processes. 

Fig. 8 is a diagram showing in brief the image coding transmitter which implements the conventional 
motion compensatory operation method disclosed in an article entitled "Dynamic Multistage Vector 
Quantization for Images", journal of The Institute of Electronics and Communication Engineers of Japan, 
Vol. J68-B, No. 1. pp. 68 - 76, Jan. 1985. In the figure, indicated by 1 is an input signal of image data 
formed of a plurality of consecutive frames on the time axis. 52 is a motion compensator which produces a 
prediction signal on the basis of the resemblance computation of correlation between the current frame 
represented by the input signal 1 and the previous frame represented by a previous frame signal 53 which 
is the previous reduced signal 1 , 54 is motion vector information provided by the motion compensator 52 
indicative of the position of a prediction signal block, 55 is a prediction signal produced by the motion 
compensator 52. 56 is a coder which codes the difference between the input signal 1 and prediction signal 
56, 57 is a decoder which decodes the signal coded by the coder 56. and 58 is a frame memory which 
stores data reproduced through the summation of the signal from the decoder 57 and \he signal from the 
motion compensator 52. 

The performance of the foregoing arrangement will be described in connection with Fig. 9. The motion 
compensation process is to calculate for the Input signal 1 the amount of distortion between a 11-by-12 
block located in a specific position in the current frame shown in Fig. 9(A) and M pieces of blocks in the 
search range S in the previous frame shown in Fig. 9(B) to evaluate the position of the block y providing a 
minimal distortion relative to the position of the input block, i.e., motion vector V. and to recognize the signal 
of the minimal distortion block as a prediction signal. 

The number of motion vectors V under search within the search range S in the given frame is assumed 
to be M (an integer greater than 1 ). The amount of distortion of the position of a specific motion vector V 
between the previous frame blocks and the current input block is calculated as a sum of absolute values of 
differences as follows. 

50 K 

di ^ E lyih- xhl ... (5) 

h=l 

55 where input vectors x = {x1. x2 xk}, search object blocks yi = {yil. yi2 yik}. i = 1. 2 M, and M 

and K are fixed values. The motion vector V is evaluated as follows. 

V = Vi {min di | i = 1,2 M} (6) 

Fig. 10 shows the sequence of operations for detecting the motion vector V. Step ST11 calculates a 
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distortion di at each of K pieces of sampling points on the basis of expression (5), and the next step ST12 
compares the di with the minimal distortion D at position I. and. if di < D, the variables are replaced to be D 
= di and I = 1. These operations are repeated for the number of search vectors, i.e., the operational 
process of expression (6). to determine the final minimal distortion D and its position 1. 

5 These operations must be completed within the period of each frame entered successively, and 
therefore a high-speed digital signal processor is required. 

As an example, the digital signal processing system shown in Fig. 6 is used to carry out the motion 
compensation process. In this case, the multiplication-sum operation takes place K x M times for each input 
block, and the number of machine cycles is the total time expended by M times of processes including 

70 comparison and updating. Generally, the numtDer of cycles for comparison and updating is small enough as 
compared with that of the multiplication-sum operation, and the volume of motion compensation operation 
for one block is virtually equal to K x M machine cycles. 

However, since these operations are determined from the time corresponding to the period of frames 
entered successively, parallel processing will be needed for the mass multiplication-sum operations to be 

75 performed in a short time, depending on the operation process cycle time of a particular digital signal 
processor. 

The conventional motion compensation scheme is Implemented as described above, and in order to 
ensure the operation time for an enormous volume of operations when carried out using a digital signal 
processor, the processor needs to have parallel processings, resulting in an increased complexity and scale 
20 of hardware structure. 



SUMMARY OF THE INVENTION 



The present invention is intended to overcome the foregoing prior art deficiencies, and a prime object 
to provide a digital signal processing apparatus which uses the multiprocessor parallel configuration to its 
maximal processing ability. 

Another object of this invention is to provide a digital signal processing apparatus which works 
30 efficiently with less number of processors and less capacity of memory, while ensuring the latitude of signal 
processing algorism. 

Still another object of this invention is to provide a digital signal processing apparatus which eliminates 
the need of address control for storing the intermediate result and transfer to the memory, ther by 
executing fast 3-input-1 -output operation. 
35 A further object of this invention is to provide a motion compensative operation method which, in 
constructing the motion compensator of an image coding system with a digital signal processing apparatus, 
requires less number of parallel processors, thereby enhancing the simplicity and compactness of th 
hardware structure. 

In order to achieve the above objectives, the inventive digital signal processing apparatus comprises a 

40 plurality of processors and a task controller which issues address control signals of task assignment to the 
processors so that they fetch an even quantity of significant information to their memories for processing. 

Furthermore, the inventive digital signal processing apparatus comprise a plurality of signal processors 
connected with one another through a dual-port memory arranged in series and/or parallel, a shared 
memory which can be accessed for reading and writing in blocks of signal processing or in arbitrary 

45 number of data from all of the signal processors, a task table which stores the process status in the signal 
processors, and a data flow controller which scans the contents of the task table at a certain interval, 
determines a charged process of each signal processor on the basis of feedback data including the 
occupancy rate of buffer memory reported by the output controller, and directs the interrupt controller of 
each signal processor to start, 

50 Furthermore, the inventive digital signal processing apparatus comprises a first through third data 
reading address generators adapted to read three independent data sets independently and simultaneously, 
and a pair of arithmetic unit and multiplier adapted to execute a 3-input-1 -output arithmetic operation at high 
speed by receiving the output of counterpart mutually. 

The inventive motion compensation method using a digital signal processing apparatus divides a 

55 current input frame of digital image data, which consists of a plurality of frames, entered successively into a 
plurality of blocks, searches the previous input frame for a pattern which resembles th block of the input 
frame, and implements a coding process with a block of minimal distortion in highest resemblance as a 
prediction signal, wherein in detecting a block of minimal distortion through the computation of inter-pattern 
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resemblance using the difference and cumulation of pixels in each block between the block of the current 
input frame and blocks of M in number (M is a positive integer) in the previous Input frame, the method 
uses, for the pattern resemblance computation, a maximum of K pieces of pixels (K is an integer greater 
than 0 and less than or equal to a total number of pixels in a block), implements an intermediate check n 

5 times (n is an integer greater than 0) during the computation of resemblance at time points when the 
number of reference pixels is smaller than K, skips the computation for remaining pixels when a cumulative 
value at each time point of intermediate check is greater than a threshold value which is set for each time 
point of intermediate check and excludes the block from the range of comparison for finding a minimal 
distortion block, and detects by computation the minimal distortion block among blocks in which cumulative 

10 values are below the thresholds at all time points of intermediate check. 



BRIEF DESCRIPTION OF THE DRAWINGS 



Fig. 1 is a block diagram showing the multiprocessor system of a conventional digital signal 
processing apparatus; 

Fig. 2 is a diagram explaining the assigned areas of the processors shown in Fig. 1; 
Fig. 3 is a block diagram showing the arrangement of other conventional digital signal processing 
20 apparatus; 

Fig. 4 is a block diagram showing in detail the arrangement of the signal processing module shown in 

Fig. 3; 

Fig. 5 is a block diagram showing the algorism of the high-efficiency coder for a moving image; 
Fig. 6 is a block diagram showing the arrangement of a third conventional digital signal processing 
25 apparatus; 

Fig. 7 is a flowchart showing the process of 3-input arithmetic operation using the digital signal 
processing apparatus shown in Fig. 6; 

Fig. 8 is a block diagram showing in brief the arrangement of the image coding transmitter which 
carries out the conventional motion compensative operation method; 
30 Fig.. 9 is a diagram used to explain the conventional motion compensative operation method; 

Fig. 10 is a flowchart showing the operational process for detecting a motion vector in the 
conventional motion compensative operation method; 

Fig. 11 is a block diagram showing the digital signal processing apparatus based on the first 
embodiment of this invention; 
35 Fig. 12 is a diagram explaining the area assignment for the processors shown in Hg. 11; 

Fig. 13 is a block diagram showing the arrangement of the digital signal processing apparatus formed 
by connecting in cascades a plurality of digital signal processors (DSP blocks) shown in Fig. 1 1 ; 

Fig. 14 is a diagram showing the concept of process of each DSP block shown in Fig. 13; 

Fig. 15 is a block diagram showing the digital signal processing apparatus based on the second 
40 embodiment of this invention; 

Fig. 16 is a block diagram showing the internal arrangement of the signal processor shown in Fig. 15; 

Fig. 17 is a diagram explaining the concept of control operation of the digital signal processing 
apparatus shown in Rg. 15; 

Fig. 18 is a diagram explaining the relation between parameter data and processing block data in the 
45 digital signal processing apparatus shown in Fig, 15; 

Fig. 1 9 is a diagram showing the correspondence between data blocks and a frame; 

Fig. 20 is a block diagram of the arrangement in which a plurality of digital signal processors are 
included in the digital signal processing apparatus shown in Fig. 15; 

Fig. 21 is a block diagram showing the digital signal processing apparatus based on another 
50 embodiment of this invention; 

Fig. 22 is a flowchart showing the operational process of the digital signal processing apparatus 
shown in Fig. 21 ; 

Fig. 23 is a flowchart showing an embodiment of the inventive motion compensative operation 
method using a digital signal processing apparatus; 
55 Fig. 24 is a diagram used to explain the method of intermediate check for the computation of 

distortion in the inventive motion compensative operation method; and 
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Fig. 25 is a diagram showing the arrangement of pixel samples at sampling points in a blocl< 
according to the intermediate checl< method for the distortion computation. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

Specific embodiments of the present invention will now be described with reference to the drawings. 
Fig 11 shows as an embodiment of this invention, an example of the image coder of the digital signal 
processing apparatus. In the figure, input data 1 is entered to a first through third input memories 6. A task 
controller 7 estimates the number of valid pixels on the basis of the contents of the input memory 6. 
determines the distribution of coding process among a first, second and third DSPs 2, and issues control 
signals as address control signals 8 to the DSPs 2. Upon receiving the address control signals 8. the first, 
second and third DSPs 2 issue addresses 9 to respective first, second and third input memories 6 to fetch 
data 10 assigned for processing, and implement the coding processes based on the preset program. Upon 
completion of processes, the first, second and third DSPs 2 store processed data In an output memory 11. 
which after reading the whole data of the DSP block, sents the processed data to the next DSP block. 

In' this case, each DSP 2 Is controlled by the task controller 7 so that all DSP 2 have even numbers of 
valid pixels assigned, and therefore the image coding process time is controlled so that the difference of 
process times among the DSPs 2 is minimal. Namely, in case of coding an Image with numbers of valid 
pixels as shown in Fig. 12(b), an area A having a relatively small number of valid pixels is enlarged to A . an 
area C having a relatively large number of valid pixels is also enlarged to C . and an area B having a larger 
number of valid pixels is reduced to s', as shown In Rg. 12(a). by the task controller 7. The task controller 
7 issues the address control signals 8 corresponding to the assignment distribution to the first, second and 

third DSPs 2. ^. . ^ . * 

For example, in response to the issuance of the address control signal 8 for coding the image data of 
area A to the first DSP 2, it produces the address 9 for the area A in the first input memory 6 to fetch data 
and implements the image coding process by following ttie prescribed program. Sirnilarly. the second and 
third DSPs 2 are directed to carry out the image coding processes for the areas B and C , respectively. 
Consequently, the first second and third DSPs 2 have their numbers of valid pixels EA , EB and EC for 
coding virtually made even, i.e.. the same quantity of image data to be processed, as shown in Fig. 12(b). 
As a result, the maximum volume of process m' dealt with by the inventive apparatus becomes sufficiently 
less than that M of the conventional apparatus, and the process time required for each DSP block fs 

reduced, . 

Fig 13 shows the inter-frame coder constructed by a serial connection of DSP blocks in three stages. 
Each DSP block performs the process shown in Rg. 14. The first DSP block 12 enters upon the input data 
1 and. after producing a differential signal, implements the valid/invalid judgment, evaluates the distribution 
of the numbers of valid pixels in the image data, and sends the information to the task controller 7. Based 
on the information, the task controller 7 issues address control signals 8 for dictating such address 
adjustment that the DSPs in the second DSP block 13 have even assignments of data. Each DSP in the 
40 second DSP block 13 implements the process by adjusting the read address as described above. The third 
DSP block 14 is designed to operate identically. 

Although in the foregoing embodiment the DSP process assignment areas are controlled on the basis of 
the valid pixel distribution among areas in image data, the present invention is not confined to this scheme, 
but feedback DSP assignment control based on the general quantity distribution of transmitted information 
45 is also possible, for example. u 

A second embodiment of this invention will be explained with reference to the drawings. Fig. 15 shows 
an example of the configuration of a digital signal processing apparatus, the second embodiment of this 
invention. In the figure. 301 is a data flow control section (D F C) working as a control means; 302 are 
control parameter data output from the data flow control section 301: 303 is a common memory (C M) 
50 which stores feedback data, a large capacity data and table, etc.; 304 is a task table (T B) which stores a 
processing status of each signal processor element (P E) 318; 305 is a common bus (C-BUS) which has the 
function as a status communicating means consisting of at least a bus connected to the common memory 
303, the task table 304 and each signal processor element 318; 306 is a video frame synchronizing signal 
(F p) which discriminates the starting point of a video frame to be supplied to the data flow control section 
55 301 in the case of inputting video signals etc.; 307 are feedback data (F b) which inform the data flow 
control section 301 of the occupying status, data quantity of a sending buffer etc. and finishing of one frame 
data processing etc. output from an output control section 308 described later; 308 is an output control 
section (O C) provided with a buffer memory for outputting data at a certain constant speed in restructuring 
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processed blocks output from a plurality of signal processor elements (P E) 318 for example in the 
scanning order in a video frame; 309 is an input terminal of analog signals; 310 is an A/D converter; 311 are 
digitized input data; 312 is a parameter memory (P M) consisting of dual port memories; 313 is an input 
frame buffer consisting of dual port memories for functioning as a block formation means by memorizing 
5 input data 311 temporarily; 314 is a bus connecting the parameter memory 312 to the signal processor 
elements 318; 315 is a bus connecting the input frame buffer 313 to the signal processor elements 318 in 
order to supply data in a block unit; 316 is a common bus input/output port connected to the common bus 
305; 317 is an interruption control port for sending/receiving timing control signals from the data flow control 
section 301; 318 are individual signal processor elements (P E) and these signal processor elements are 
w provided with software which functions as a starting means, and these signal processor elements are 
mutually connected with buses 314 and 315, and said last stage signal processor element 318 and the 
output control section 308 are also connected with buses 314 and 315; 319 is an output terminal through 
which data are output at a certain constant speed and timing from the output control section 308; 320 is a 
multiprocessor module comprising the parameter memory 312. the input frame buffer 313 and a plurality of 
75 signal processors 318 connected In series through the buses 314 and 315. for example. 

The data flow control section 301 has a judgment means which scans the task table 304 at a certain 
constant cycle and judges the processing conditions of individual signal processor elements 318. The data 
flow control section 301 also has a control means which based on the result of the judgment means it 
decides if each signal processing module can process the next signal process block and when the 
20 processing is found to be possible it makes process start by sending out an interruption signal to the 
interruption control port 317 and when the processing is found to be impossible it instructs the transfer of 
the signal process block to another signal processing module which can process the block. When a parallel 
processing of a constant cycle, in which the task table 304 is scanned, is to be done the scanning period 
shall be the number of parallelness times of the input cycle of the signal process block, and when a series 
25 processing is to be done the scanning period shall be l/n of the input cycle; thus by the synchronization 
with the input data frame (for example a video frame) the matching with the real time can be maintained. 

Fig. 16 shows an example of the internal constitution of the signal processor elements 318 as shown in 
Fig. 15. In the figure, 330 is a terminal to which the common bus inpuVoutput port 316 is to be connected; 
331 is a terminal to which the interruption control port 317 is to be connected; 332 is a terminal to which the 
30 ^ buses 314 and 315 are to be connected; 333 is similarly a terminal which connects the buses 314 and 315 
between the adjacent signal processors; 334 is an external bus control section (BUS-CONT) with the 
function as a competitive control means to control the make/break of the common bus 305 through the bus 
316; 335 is a bus for loading a writable control storage (W C S) 336, which memorizes a signal processing 
program, from the external bus control section 334 at an initial time; 337 is a BUSREQ which requires the 
35 connection of the common bus 305 to the external bus control section 334; 338 is a BUSACK which 
denotes the permission for the BUSREQ 337; 339 are command codes which are successively read out 
from the writable control storage 336 according to the signal processing program; 340 is a digital signal 
processor (D S P) which execute data processing, 341 is an INTACK which informs an interruption control 
section (INTER-CONT) 345 of the reception of an interruption from the digital signal processor 340; 342 is. 
40 on the contrary to it, an INTREQ which informs the digital signal processor 340 of the requirement of an 
interruption; 343 is a bus to connect an intemal bus 344 to the common bus 305 through the externa! bus 
control section 334. and the internal bus 344 Is directly connected to the digital signal processor 340; 345 is 
an interruption control section (INTR-CONT) which processes an interruption signal from the data flow 
control section 301; 346 is a bus which writes the parameter of a processed data block on a dual port 
45 memory 349 through the internal bus 344; 347 is similarly a bus which writes processed block on the dual 
port memory 349; 348 is a bus which connects a work memory in the dual port memory 349 and the 
internal bus 344; 349 is a dual port memory provided with a parameter memory, data memory and work 
memory which outputs data to the adjacent signal processor element 318 through the terminal 333 and 
buses 314 and 315. 

50 Fig. 17 explains the internal control operation of the digital signal processing apparatus shown in Fig. 
15, and the same parts as those shown in Fig. 15 are given the same symbols; the explanation of them is 
therefore omitted. 

In the figure. 351 is a block which shows analytical operation of a parameter inside the signal processor 
element 318; 352, 353. 354 are blocks which show the operation of individual signal processing subroutines 
55 A. B and C according to the parameter of each of them; 355 is a block which shows the contents of a 
parameter memorized in the dual port memory 349; 356 is a block which shows the contents D of 
processed block data memorized in the dual port memory 349. 

Fig. 18 explains an example of the relation between the parameter data and process block data until a 
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data block is successively given a series of function processes and an output result is obtained through 
series and parallel processes of block units executed In the digital signal processing apparatus shown in 
Fig. 15. in the figure, 360 is a block address (B A D) showing the position of an input block in a frame; 361 
is a processing number (PN) showing the kind of a process to be given to said block; 362 is a flag (PFLG) 
5 which discriminates the result of the process; 363 is a data block in which for example eight subbiocks are 

combined to form a block. 

Fig. 19 shows an example of correspondence between the data block 363 shown in Fig. 18 and one 
video frame when a picture coding process is performed in this system. In the figure, 365 is one video 
frame; 366 is a data block when a picture is divided into 16 lines x 16 pixels; 367 is a subblock which is 

10 obtained when the block is further divided into 8 blocks of 4 lines x 4 pixels. 

An explanation of the operation based on Fig. 15 is given in the following. Input data 311 digltiaed by an 
A/D converter 310 are memorized in an input frame buffer 313 being scanned in a raster form in 
synchronization with a video frame synchronizing signal 6, for example. Input data 311 memorized in the 
input frame buffer 313 are added to initial parameter data 302 by the data flow control section 301 by 

15 blocks and the parameter data 302 are memorized -in the parameter memory 312. These parameter 
memory 312 and input frame buffer 313 consist of dual port memories and writing/reading is simultaneously 
possible between two independent ports. 

Data blocks are read from the input frame buffer 313, and the parameter is read in a data block unit 
from the parameter memory 312. Data blocks and parameter are sent through the buses 314 and 315 to the 

20 signal processor 318 element where they are given the first process of a series of functional processes in a 
block unit. Next, the results and the rewritten parameters are written in the dual port memory 349 in the 
signal processor element 318. It is the basic function of a processor module 320 to execute processes 
successively between the adjacent signal processor elements 318 and to execute a pipeline processing for 
each block unit. 

25 When a processing is executed for each block unit, if a feedback data such as coded previous frame 
data are to be referred to. feedback data are input to the common memory 303 connected to the common 
bus 305 and memorized. The process of a new video frame is performed by such processing that the other 
signal processor 318 than the one which data have written through common bus 305 refers the common 
memory 303. If the writing of the feedback data of the previous frame is not completed in the proper 

30 position in the common memory 303, the execution time of the process shall be specified. 

When the processing of a unit (block processing) is finished, each signal processor element 318 
memorizes the status showing the completion of the present processing in the task table 304. and wait the 
next processing. The data flow control section 301 scans the task table 304 and when the processing of the 
former stage signal processor element 318 is completed, it sends out an interruption signal to said Signal 

35 processor element 318 and start the next processing. By repeating the operation, the execution of the 
operation control of each signal processor element 318 Is performed. 

To conduct parallel processing in a block unit for each processor module 320, the data processing 
condition in the input frame buffer 313 of each processor module 320 is detected with the status information 
of the initial stage signal processor 318 and individual block data are distributed by proper load distribution 

40 and input to each multi-processor module 320. 

These results are shown by the control parameter data of the initial stage and the signal processor 
element 318 discriminates the processing for the block by deciphering the above results and executes a 
proper processing. Among these processings there are for example functional processors such as a block 
identifier 253, a coder 254. a local decoder 260, an inter-frame subtracter 252, a motion compensator 265, 

45 an inter-frame adder 261. a variable length coder 256, and besides them a processing which performs only 
load distribution such as a processing of transferring block data is included. 

In the data flow control section 301. it is possible to make an arbitrary signal processor 318 undertake 
an arbitrary processing by controlling the first stage parameter; owing to such performance as mentioned 
above the load can be so distributed to signal processor elements 318 as to make them work efficiently as 

50 much as possible. 

The output control section 308 reconstitutes processed blocks which are output at random times into for 
example a scanning order of an input video frame and produces a resultant output for an output terminal 
319 and also produces f edback data 307 to inform the data flow control section 301 of these data. 

The output control section 308 takes charge for example of a video multiplex section 257 and a 
55 transmitting buffer 258 shown in Fig. 5, and it outputs a feedback signal 269 from the transmitting buffer 
258 to a coding control section 270 which takes charge of the data flow control section shown in Rg. 15. 

The data flow control section 301 takes charge of the functions of above-mentioned load distribution 
and the coding control section 270 as shown in Fig. 5, and finds the block identification control signal 273 
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and coding control signal 274 and multiplex them in the control parameter data for the execution of the 
whole characteristic control. Refer to Fig. 16; the processing of a single signal processor element 318 is 
started by the interruption from the data flow control section 301. and the contents of the parameter 
memory 312 is input to It through an internal bus 344. On the basis of the discrimination result of the 

5 contents, the processing of one unit of block data is performed by a digital signal processor 340. 

The result and rewritten parameters are written in a dual port memory 349. and the status is set in th 
task table 304 through an external bus control section 334; thus the preparation for the next process is 
ready. An interruption control section 345 interfaces the interruption from the data flow control section 301 
with the digital signal processor 340. The parameter and the data written in the dual port memory 349 are 

10 read by an adjacent signal processor element 318 which is connected to a terminal 333. and the next stage 
process is given. 

Fig. 17 shows the flow of these processes performed by the data flow control section 301, and it shows 
the relation between the control of writing/referring of feedback data to the common memory 303 and the 
control of status writing in the task table 304 by the data flow control section 301 through the common bus 

75 305, and the start processing control in the signal processor element 318 by a parameter analyzer 351. 

Fig. 1 8 shows the rewriting of the contents of control parameter data 302, which are added correspond- 
ing to an input block data 363, and the flow of these processes. A block address which shows for example 
the position in a frame or time sequential order of a blocks and a flag 362 which is referred to on the kind of 
the next process and the contents of the next process are contained in the control parameter data 302. The 

20 block address 360 is used for the discrimination of a special process in a certain case for example with an 
end point in a picture or for the restructure of data in the output control section 308 when a process is 
finished- The flag 362 shows for example the results etc, of coding control information 271, a block 
identification control signal 273. coding control signal 274. and a block identifier 253 as shown in Fig. 5. 
Input block data 363 are set to have the minimum size handled in a unit processing. The motion 

25 compensator 265 shown in Fig. 5 has a block of 16 x 16 size and after the block identifier 253 blocks of 4 x 
4 Sizes are handled. In such a case as mentioned above where a block size differs for each unit processing, 
block sizes are arranged to have matching between a maximum block size and a subblock size contained in 
it. In this case, eight pieces of 4 x 4 blocks are combined to constitute a 16 x 16 block. When coding of a 
picture is performed, this block corresponds to a small picture element made by dividing an ordinary one 

30 frame into small square picture elements. 

Fig. 19 shows an example where one video frame 365 is divided into a block 366 and subblocks 367, 
In the above embodiment, a signal processor element 318, which has a single digital signal processor 
340, is shown but when a higher speed processing is preferable a hierarchical structure combined with a 
plural number of digital signal processors can be used. The constitution of the signal processor element 318 

35 in the case of the hierarchical structure is shown in Fig. 20. In this case, as the load for the data flow control 
section 301 increases a local data flow control section 370, a local common memory 371 , and a local task 
table 372 are provided inside the signal processor 318 In order to locally execute the optimum load 
distribution inside the signal processor. The data flow of the digital signal processor 340 which is connected 
to a local common bus 373 is the same as that shown in Fig, 15 except that the operation is executed 

40 inside the signal processor 31 8, 

In the above embodiment, a series/parallel structure is adopted but in some case a complete parallel or 
complete series structure is effective according to the purpose of a signal processing and a real time 
processing could be possible. 

The other embodiment of this invention is explained with reference to Fig. 21. In Fig. 21. 420, 421 and 
422 are address generators for readout data; 423 is an address generator for writing data; 424. 425 and 426 
are data memories, and address data generated by the address generator 423 are input to these memories; 
427. 428 and 429 are data buses which transfer readout data from the data memories 424, 425 and 426; 
430. 431 and 432 are registers for holding data transferred from data buses 427. 428 and 429; 433 is a 
register to hold the output of the register 432; 434 is a selector to select the output of the register 430 or 

50 that of the register 433; 435 is a selector to select the output of the register 431 or that of the register 441 ; 
the selector 434 and the selector 435 constitute a first selector group: 436 is a selector to select the output 
of the register 430 or the output of a register 439; 437 is a selector to select the output of the register 431 
or the output of the register 433; the selector 436 and the selector 437 constitute a second selector group; 
438 is an operator which operates by inputting the output of the selectors 434 and 435; 440 is a multiplier 

55 which performs multiplication by inputting the output* of selectors 436 and 437; the register 439 is the one to 
hold the output of the operator 438; a register 441 is the one to hold the output of the multipfier 440; 442 is 
a selector which selects the input from the register 439 or the input from the register 441 and outputs it; 
443 is an adder which adds the output of the output selector 442 and the output of an accumulator 444 and 
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outputs to said accumuiator 444; 445 is a data bus to transfer output data of the accumulator 444 and the 
oi pu seleSor 442; 446 is an Interface circuit which performs outputting/lnputting of data to/from external 
circuits: 451 - 453. 461 - 463. 471 - 473 denote signal lines which output the output of data mennor.es 424. 
425 and 426 to data buses 427, 428 and 429. 

The following are the explanation of operation. In Fig. 21, assume that data series with N elements, A = 
(ai|i = 1 to N), B = (bi|i = 1 to N), C = (ci|i = 1 to N) are previously stored respectively in the data 
memory 424, data memory 425, and data memory 426. , i„ 

Under the conditions above, the operation when the operation of three inputs and one output is 
performed is shown below. The operation processing flow is shown in Fig. 22. 

To beoin with, at a step ST31 . top addresses of three series of input data and of an output result stonng 
memory are initially set by address generators 420. 421 and 422. After that the address generators are 

assumed to take simple increment actions. 

The data memory 424 corresponds to the address generator 420: the data memory 425 corresponds to 
the address generator 421; the data memory 426 corresponds to the address generator 422. Individual data 
memories 424. 425 and 426 readout data based on the addresses of address generators 420, 421 and 422 

Data are input to three data buses 427. 428 and 429 (X-BUS. Y-BUS. Z-BUS) respectively '^o-r. da a 
memories 424 425 and 426. so that for the outputting of each of these data memories 424. 425 and 426 to 
a specified data bus. only one bus out of three is controlled to be effective, and the other two are controlled 
to be in the state of a high impedance. In this case, the output of data buses is limited to that of the one 
which is made to be effective. For example, when A data series is to be input to the register 430. the A 
series data are output to the signal line 451, and thelignal lines 461 and 471. which output data from other 
data memories 425 and 426 to the data bus 427. are In the state of a high Impedance. The same thing goes 

for other data buses. „ ^ ^ 

Eabh of these data series are set respectively in the registers 430. 431 and 432. Three data buses 427 
428 and 429 can select data from three data memories 424, 425 and 426, so that 3^ kinds of data set 
combinations can be supplied to the registers 430, 431 and 432. 

Two expressions as shown below are defined in the way of three Input operation and then the 
processing method is shown in the following: 
(al 9 bi) X ci (7) 

^° whie'^x e y) Jxpresses an arithmetic and logic operation for finding results or values of addition, 
subtraction, maximum values or minimum values for two input data x, y. and (x x y) expresses 
multiplication. The explanation of operation processing flow of the expression (7) is given in the Table 3. 
The mark of "X" in the table represents an unknown. 
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At a step ST32 a selector 434 selects the side of a register 430 and a selector 431 selects the side of a 
register 435. By the use of these two selected data (ai and bi) the operation (ai e bi) is performed with an 
operator 438. and the result is stored in a register 439. This value is output from the register 439 in the next 
5 step. 

The data ci in the register 432 are delayed by the register 433 by one step. In the next step a selector 
436 selects the side of the register 439 and a selector 437 selects the side of a register 433. By the use of 
these two data, (ai © bi) Is multiplied by ci with the multiplier 440 and the result (ai e bi) x ci is stored in a 
register 441. This value is output from the register 441 in the next step. By an output selector 442*s 
10 selecting the register 441, the data (ai © bi) x ci are sent to one of the data memories 424, 425 and 426 
through a data bus 445 based on the address shown by the address generator 423. 

In this invention, readout of data, execution of operation and writing of data are continuously executed 
by a pipeline processing, so that the control of each section can be operated in parallel. Therefore if the 
three input one output operation is executed for a data series with N elements, from the time when the first 
16 datum is readout until the time when the processing result of the last datum is written into a memory, the 
period of (N + 3) cycles are required. 

The explanation of operation processing flow of expression (8) is given in Table 4. The mark "x" in 
Table 4 represents an unknown. 
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55 The operation in which three input data are readout to registers 430, 431 and 432 is the same as that in 
the case of expression (7). When the operation of expression (8) is executed, the selector 436 selects the 
side of the register 430 and the selector 437 selects the side of the register 431 . and the operation (ai x bi) 
Is performed by the multiplier 440 and the result is set in the register 441. 

18 

MSDOCID: <EP 03291 SI A2_l_> 



EP 0 329 151 A2 



In the next step, the selector 434 selects the side of the register 433 and the selector 435 selects the 
side of the register 441 . and the operation (ai x bi) ci is executed by the operator 438 and the result is set 
in the register 439. In the next step, by the selector 442*s selecting the side of the register 439 the selection 
result is written into one of the data memories 424 to 426. 

5 Thus the case of the operation of expression (8) is the same as the case of expression (7). thereby the 
total processing time requires (N + 3) cycles. 

In the case of the operation of two input one output, the value of (ai e bi) can be obtained through the 
procedure as shown in the following: the selector 434 selects the side of the register 430 and the selector 
435 selects the side of the register 431 and after the operation is executed by the operator 438 the side of 

70 the register 439 is selected by the selector 442 in the next step. The value of (ai x bi) can be obtained 
through the procedure as shown in the following: the selector 436 selects the side of the register 430 and 
the selector 437 selects the side of the register 431. and after the execution of the operation with the 
multiplier 440 the selector 442 selects the side of the register 441 in the next step. 

The processing speed in the case of three input one output is (2N + 10/N + 7) times of that of prior 

75 art. that is almost half times if N is a large number. 

When a cumulative value is to be found in the three input one output operation, a cumulative value till a 
point on the way or an initial value is stored in the accumulator 444 and each one of the successive 
operation results is added to the cumulative value in the accumulator 444 with the adder 443 and the added 
result is stored in the accumulator 444 again. These processes are performed repeatedly. Processing 

20 cycles therefore are not increased due to cumulative operation. 

Fig. 23 shows a flow chart to realize a method for motion compensative operation which refers to an 
embodiment of this invention. Fig. 24 is a drawing for the explanation of an intemnediate check method in 
the distortion quantity operation in this invention. Rg. 25 is a disposition drawing of a pixel sample at a 
sample point in a block in the intermediate check method for distortion operation in this invention. 

25 Before the operation process, on the first block among M pieces of candidate blocks for search in the 
previous frame data, distortion quantity of all the pixels in the block shall be measured; ttie distortion 
quantity In this case is defined to be the minimum distortion. As for the distortion quantity, differential 
absolute value sum is adopted. In the distortion quantity operation about on and after the second block the 
calculation of differential absolute values of all pixels is not needed, but at an intermediate check point if the 

30 intermediate distortion quantity exceeds a certain value, it is judged that the ultimate distortion quantity of 
the block cannot be smaller than the minimum distortion D and the distortion quantity operation for the 
residual part is stopped. 

A block which gives the minimum distortion is detected by the calculation of the degree of approxima- 
tion between the patterns by using the difference and accumulation of pixels in the respective M blocks 

35 which are selected out of the present input frame and the previous input frame (M is a positive integer). The 
number of pixels used for. the calculation of the degree of approximation is K at the maximum (K is an 
integer greater than or equal to one and smaller than or equal to the number of a total number of pixels in 
one block). During the calculation of the degree of approximation at the time when the number of pixels in 
reference is less than K intermediate checks are performed four times, and an intermediate check point 

40 shall be provided in each 1/4 sample point. Fig, 25 shows examples of sample points used for distortion 
quantity operation. The mark O expresses a first time sample point for distortion quantity operation; the 
mark x expresses a second time sample point for distortion quantity operation; the mark A expresses a third 
time sample point for distortion quantity operation; the mark @ expresses a fourth time sample point for 
distortion quantity operation. 

45 In Fig. 24 when the total number of sample points is assumed to be K, express threshold levels at a 
first, second and third intermediate check points as d1 , d2 and d3 ; then put 
dl' = D/4 + th1 (9-1) 
d2' = D/2 + th2 (9-2) 
d3' = 3D/4 + th3 (9-3) 

50 where thi, th2, th3 can be set independently. Express the distortion quantity at the first, second and third 
intermediate points as dil, di2 and di3. 

In this case, dil expresses the value of the first time distortion quantity in Hg. 25; di2 expresses a 
cumulative value, dil plus the second time distortion quantity: di3 expresses a cumulative value. di2 plus 
the third time distortion quantity. Therefore, the cumulative value in which the fourth time distortion quantity 
55 is accumulated becomes the distortion quantity in which all sample points are included. 

On the basis of a distortion quantity judgment at an intermediate check point if a block is estimated to 
have a large distortion quantity, the checking of the block is canceled before the block reach s the last 
check point to save us I ss operation processes. In other words if a distortion quantity dil which is 
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obtained by a distortion calculation at 1/4 K* sample point in a step ST41 is found to be di1 > d1 by the 
judgment in the next step ST42, the block is canceled, if not. the operation is continued to the next step 
ST43 and the operation of distortion quantity di2 is performed with the distortion calculation at. 1/2 K sample 
point. If it is found that di2 > d2' in the judgment in the next step ST44 the block is canceled and if not. the 

5 operation is continued into the step ST45. The operation of distortion quantity di3 is performed with the 
calculation at 3/4 K sample point and if it is found that di3 > ds' by the judgment in the next step ST46 this 
block is canceled, if not. the operation is continued into the step ST47 and the distortion quantity di at K 
sample point is calculated for performing comparison and renewal. 

As shown in the above if the processing is performed till the last step the same result is obtained as 

70 that obtained with the conventional method in which the whole pixels are used for a distortion operation. If 
the distortion quantity di, in this case, is smaller than the minimum distortion D, the value of the minimum 
distortion D is renewed for di and the motion vector index is renewed for the index i. The final minimum 
value of distortion D and the vector index I which shows the movement to give D can be obtained by 
repeating such operating processes as mentioned above by the number of times corresponding to the 

15 number of searching vectors till the process proceeds up to the Mth block. 

In the above embodiment, the example where differential absolute value sum is used for distortion 
quantity operation is shown, but differential square value sum can also be used. 

In the above embodiment^ explanation is made about the case of motion compensative operation, but 
the execution of inner product vector quantization operation is also possible and the same effect can be 

20 obtained. When an operation result is compared with a threshold value at an intermediate check point, the 
relation in magnitude is opposite to what is mentioned in the above embodiment. 

Claims 

25 

1 . A digital signal processing apparatus comprising memory means which stores input information and 
reads out the stored information; a task controller which finds the quantity of valid information among 
information to be processed and produces address control signals for dividing the information to be 
processed into a plurality of divisions in a sense of adaptation so that a plurality of assigned quantities of 

30 valid information are even; and a plurality of digital signal processors which receive the address control 
signals from said task controller to adjust addresses of assigned areas to be processed, read information 
cut of said memory means to implement coding processes, and output the results of processes. 

2. A digital signal processing apparatus according to claim 1 , wherein said memory means comprises a 
plurality of input memories provided in correspondence to said digital signal processors. 

35 3. A digital signal processing apparatus comprising a multiprocessor module including a plurality of 
signal processors in connection, each of said signal processors including an instruction memory which 
stores a sub-program that describes a functional process of a signal processing operation that is a 
combination of the functional processes for data blocks formed of a plurality of data, said instruction 
memory being accessible for writing from outside 

'io and connected through an internal bus with a data memory which is used for the execution of said signal 
processing operation; 

a digital signal processor which executes any of said functional processes in units of data blocks in 
accordance with a sub-program stored in said instruction memory; 

block forming means which forms a signal processing block of one unit by appending, for each data block. 
^5 control parameters including the type of functional process to be executed, a block address indicative of the 
position in time and spatial domains and the order in time and spatial domains of said data block and 
information indicative of post-process attributes of said data block; 

activation control means which analyzes said control parameters to activate each unit of said functional 
process indicated by said parameter; 
50 an interrupt controller which controls the timing of execution of said activation control means in response to 
an external interrupt; 

status indication means which indicates to the outside as to whether said functional process is in execution; 
a data input bus which reads out a unit of signal processing block from an external data memory by way of 
said internal bus for the execution of said functional process; 
ss at least one dual-port memory capable of reading and writing independently on both ports, with one port 
being connected to said internal bus and adapted to write a unit of signal process block resulting from said 
functional process, and with another port being opened to the outside; 

and an external bus controller including a bus contention control means which connects said internal bus to 

20 
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a common bus consisting of at least one data bus provided extemally only when the common bus is not 
used by an external device and implementing data transfer for the unit of signal processing block or 
arbitrary quantity of data; 

and a transfer control means which performs data transfer asynchronously in units of signal processing 

5 block by linking adjoining ones of said signal processors in a serial and/or parallel arrangement by 
connecting the externally-opened port of said dual-port memory in one processor to said data input bus of 
another processor; an input frame buffer of dual-port memory which forms a digital signal into blocks and 
writes the signal on one port in units of frame or block on a realtime basis and implements data input by 
connecting another port to said input bus in the first-stage signal processor in said multiprocessor module; 

70 at least one common memory which is connected to said common bus and adapted to transact data in 
units of signal processing block or arbitrary number of data with all of said signal processors; a task table 
which memorizes said status indication means in said signal processors; an output controller which reads 
out the last-processed signal processing block written in said dual-port memory in the last-stage signal 
processor in said multiprocessor module, rearranges the block in accordance with said processing 

15 parameters so as to be in compliance with the position in time and spatial domains and the order in time 
and spatial domains, stores the rearrangement result temporarily in a buffer memory and outputs the buffer 
memory contents at a constant quantity per unit time; a data flow controller which scans the contents of 
said task table at a constant interval, determines the process assignments of said signal processors on the 
basis of feedback information such as the degree of occupancy of said buffer memory indicated by said 

20 output controller, and activates the interrupt controller of each said signal processor; and a writing means 
which newly generates said processing parameters for each signal processing block entered newly to said 
input frame buffer and writes the parameters in corresponding positions of input frame buffer. 

4, A digital signal processing apparatus according to claim 3. wherein said data flow controller 
comprises judgment means which judges the processing status of each processor by scanning said task 

25 table at a constant interval; and first control means which determines on the basis of the result provided by 
said judgment means as to whether each signal processing module can process a next signal processing 
block, and. If possible, issues an interrupt signal- to said interrupt controller to initiate processing or, if 
impossible, directs a signal processing module, which can have a process, to transfer said signal 
processing block. 

30 5. A digital signal processing apparatus according to claim 4. wherein said judgment means performs 
scanning, in case of parallel processing in a constant period, in a time length which is the input period of 
said signal processing block multiplied by the number of parallel processings, or. in case of serial 
processing, in a time length which is the input period divided by an integer greater than or equal to one. 
and implements matching with real time by being in synchronism with input data frames. 

35 6. A digital signal processing apparatus according to claim 3. 4 or 5. wherein said data flow controller 
includes second control means, in which a piece of image data is divided into small rectangular blocks to 
form said data blocks, the size of data block is made equal to a maximum or minimum size dealt with by 
said functional processes and positions of small blocks in said piece of image data are used as spatial 
position information of said process parameters, and. in case of inter-frame coding for a moving image, a 

40 frame memory for storing a coded previous frame image is said common memory and a signal proc ssing 
block processed by a signal processor unit is written in the position of the common memory by way of said 
common bus thereby to form feedback data, and a new image frame is processed by making reference to 
said feedback data from another signal processing block by way of the common bus: and third control 
means which, if feedback data of the previous frame has not yet written in the position in the common 

45 memory, dictates an execution wait for the process. 

7. A digital signal processing apparatus according to claim 6. wherein a plurality of digital signal 
processors are connected in parallel through a local common bus, a local signal processor is formed of a 
local data flow controller which performs only process activation control for said digital signal processors, a 
local common memory which can be accessed commonly by said digital signal processors, and a plurality 

so of digital signal processors, a plurality of local signal processors being connected to complete said signal 
processors in a hierarchical structure. 

8. A digital signal processing apparatus according to claim 7, wherein said multiprocessor module is 
^-'bne in number. 

9. A digital signal processing apparatus according to claim 7. wherein said multiprocessor module Is 
55 more than one in number. 

10. A digital signal processing apparatus comprising a first through third address generators which 
generate independently addresses for read data; a fourth address generator which generates write address 
information indicative of a write data destination and address; a first through third data memory, from which 
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data are read out in accordance with addresses of said first through third address generators and to which 
data are written in accordance with address information of said fourth address generator; an operator which 
performs an arithmetic/logic operation for a pair of data selected by a first selector group under control of 
nnicroprogram fronn among a first data pair consisting of data of said first data memory and data of said 

5 second data memory and a second data pair consisting of data of said third data memory and output data 
of a multiplier; a second selector group which selects one of said first data pair, and a third data pair 
consisting of data of said third data memory and output data of said operator under control of said 
microprogram so that a selected data pair is subjected to a multiplication operation by said multiplier; an 
output selector which selects one of the output of said operator and the output of said multiplier and 

10 transfers the selected output data over said first data through third data memories or a data bus to/an 
external circuit; and an accumulator which provides an output for the addition operation by an adder for 
cumulation of output value, holds cumulatively the result of addition by said adder, and transfers the output 
as write data to said first through third data memories or an external circuit. 

11. A digital signal processing apparatus according to claim 10, wherein said first through third address 
75 generators have an auto-increment mode tor addressing. 

12. A motion compensative operation method wherein a current input frame of digital image data 
consisting of a plurality of frames entered successively is divided into a plurality of blocks, and, in detecting 
a block which provides a minimal distortion resulting from computation of resemblance between patterns 
based on the cumulation of differential absolute values or differential square values of pixels in block 

20 between a block of the current input frame and blocks of M in number (M is a positive integer) of the 
previous frame, a maximum of K (K is an integer greater than or equal to one and smaller than or equal to 
the number of a total number of pixels in one block) pixels are used for the pattern resemblance 
computation, intermediate checks are conducted n times (n Is an integer greater than or equal to one) 
during the resemblance computation at time points when the number of reference pixels is smaller than K, a 

25 block is determined to be outside the range of comparison for finding a minimal distortion block if a 
cumulative value at each time point is greater than a threshold value preset for each intermediate check 
time point, and a block having a minimal distortion is detected by computation from among blocks whose 
cumulative values are below the threshold values at all time points of intermediate checks, 
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